Northern North Sea
. We thank R1 for pointing some expositions issues and the proposed
We thank reviewers for detailed and helpful reviews. Table 1 shows the results. If we understand correctly, R2's main concern is that the word embeddings of We believe that it would hardly happen. The reasons are as follows. Second, we can easily assume a FSL scenario in which we have access to the labels of the test set.
On Statistical Estimation of Edge-Reinforced Random Walks
Qinghua, null, Ding, null, Anantharam, Venkat
Reinforced random walks (RRWs), including vertex-reinforced random walks (VRRWs) and edge-reinforced random walks (ERRWs), model random walks where the transition probabilities evolve based on prior visitation history~\cite{mgr, fmk, tarres, volkov}. These models have found applications in various areas, such as network representation learning~\cite{xzzs}, reinforced PageRank~\cite{gly}, and modeling animal behaviors~\cite{smouse}, among others. However, statistical estimation of the parameters governing RRWs remains underexplored. This work focuses on estimating the initial edge weights of ERRWs using observed trajectory data. Leveraging the connections between an ERRW and a random walk in a random environment (RWRE)~\cite{mr, mr2}, as given by the so-called "magic formula", we propose an estimator based on the generalized method of moments. To analyze the sample complexity of our estimator, we exploit the hyperbolic Gaussian structure embedded in the random environment to bound the fluctuations of the underlying random edge conductances.
- Europe > United Kingdom > North Sea > Northern North Sea (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Research Report (0.49)
- Workflow (0.46)
Training Free Guided Flow Matching with Optimal Control
Wang, Luran, Cheng, Chaoran, Liao, Yizhen, Qu, Yanru, Liu, Ge
Controlled generation with pre-trained Diffusion and Flow Matching models has vast applications. One strategy for guiding ODE-based generative models is through optimizing a target loss $R(x_1)$ while staying close to the prior distribution. Along this line, some recent work showed the effectiveness of guiding flow model by differentiating through its ODE sampling process. Despite the superior performance, the theoretical understanding of this line of methods is still preliminary, leaving space for algorithm improvement. Moreover, existing methods predominately focus on Euclidean data manifold, and there is a compelling need for guided flow methods on complex geometries such as SO(3), which prevails in high-stake scientific applications like protein design. We present OC-Flow, a general and theoretically grounded training-free framework for guided flow matching using optimal control. Building upon advances in optimal control theory, we develop effective and practical algorithms for solving optimal control in guided ODE-based generation and provide a systematic theoretical analysis of the convergence guarantee in both Euclidean and SO(3). We show that existing backprop-through-ODE methods can be interpreted as special cases of Euclidean OC-Flow. OC-Flow achieved superior performance in extensive experiments on text-guided image manipulation, conditional molecule generation, and all-atom peptide design.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Europe > United Kingdom > North Sea > Southern North Sea (0.04)
- Europe > United Kingdom > North Sea > Northern North Sea (0.04)
What is Wrong with Perplexity for Long-context Language Modeling?
Fang, Lizhe, Wang, Yifei, Liu, Zhaoyang, Zhang, Chenheng, Jegelka, Stefanie, Gao, Jinyang, Ding, Bolin, Wang, Yisen
Handling long-context inputs is crucial for large language models (LLMs) in tasks such as extended conversations, document summarization, and many-shot in-context learning. While recent approaches have extended the context windows of LLMs and employed perplexity (PPL) as a standard evaluation metric, PPL has proven unreliable for assessing long-context capabilities. The underlying cause of this limitation has remained unclear. In this work, we provide a comprehensive explanation for this issue. We find that PPL overlooks key tokens, which are essential for long-context understanding, by averaging across all tokens and thereby obscuring the true performance of models in long-context scenarios. To address this, we propose \textbf{LongPPL}, a novel metric that focuses on key tokens by employing a long-short context contrastive method to identify them. Our experiments demonstrate that LongPPL strongly correlates with performance on various long-context benchmarks (e.g., Pearson correlation of -0.96), significantly outperforming traditional PPL in predictive accuracy. Additionally, we introduce \textbf{LongCE} (Long-context Cross-Entropy) loss, a re-weighting strategy for fine-tuning that prioritizes key tokens, leading to consistent improvements across diverse benchmarks. In summary, these contributions offer deeper insights into the limitations of PPL and present effective solutions for accurately evaluating and enhancing the long-context capabilities of LLMs. Code is available at https://github.com/PKU-ML/LongPPL.
- North America > United States (1.00)
- Asia > Middle East > Iraq (0.04)
- Asia > Middle East > Syria (0.04)
- (11 more...)
- Government > Regional Government > North America Government > United States Government (1.00)
- Energy > Power Industry (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Classification under Nuisance Parameters and Generalized Label Shift in Likelihood-Free Inference
Masserano, Luca, Shen, Alex, Doro, Michele, Dorigo, Tommaso, Izbicki, Rafael, Lee, Ann B.
An open scientific challenge is how to classify events with reliable measures of uncertainty, when we have a mechanistic model of the data-generating process but the distribution over both labels and latent nuisance parameters is different between train and target data. We refer to this type of distributional shift as generalized label shift (GLS). Direct classification using observed data $\mathbf{X}$ as covariates leads to biased predictions and invalid uncertainty estimates of labels $Y$. We overcome these biases by proposing a new method for robust uncertainty quantification that casts classification as a hypothesis testing problem under nuisance parameters. The key idea is to estimate the classifier's receiver operating characteristic (ROC) across the entire nuisance parameter space, which allows us to devise cutoffs that are invariant under GLS. Our method effectively endows a pre-trained classifier with domain adaptation capabilities and returns valid prediction sets while maintaining high power. We demonstrate its performance on two challenging scientific problems in biology and astroparticle physics with data from realistic mechanistic models.
- Europe > Italy (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > United Kingdom > North Sea > Northern North Sea (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)
Prosody in Cascade and Direct Speech-to-Text Translation: a case study on Korean Wh-Phrases
Zhou, Giulio, Lam, Tsz Kin, Birch, Alexandra, Haddow, Barry
Speech-to-Text Translation (S2TT) has typically been addressed with cascade systems, where speech recognition systems generate a transcription that is subsequently passed to a translation model. While there has been a growing interest in developing direct speech translation systems to avoid propagating errors and losing non-verbal content, prior work in direct S2TT has struggled to conclusively establish the advantages of integrating the acoustic signal directly into the translation process. This work proposes using contrastive evaluation to quantitatively measure the ability of direct S2TT systems to disambiguate utterances where prosody plays a crucial role. Specifically, we evaluated Korean-English translation systems on a test set containing wh-phrases, for which prosodic features are necessary to produce translations with the correct intent, whether it's a statement, a yes/no question, a wh-question, and more. Our results clearly demonstrate the value of direct translation systems over cascade translation models, with a notable 12.9% improvement in overall accuracy in ambiguous cases, along with up to a 15.6% increase in F1 scores for one of the major intent categories. To the best of our knowledge, this work stands as the first to provide quantitative evidence that direct S2TT models can effectively leverage prosody. The code for our evaluation is openly accessible and freely available for review and utilisation.
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > Dominican Republic (0.04)
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
- (6 more...)
Deep Learning for Gamma-Ray Bursts: A data driven event framework for X/Gamma-Ray analysis in space telescopes
The HERMES (High Energy Rapid Modular Ensemble of Satellites) Pathfinder mission serves as an in-orbit demonstration of a constellation of nanosatellites whose primary scientific purpose is to discover intense high-energy transients, such as gamma-ray bursts, across a broad energy range (few keV to few MeV) with unparalleled temporal precision and exact localisation. By 2024, the first constellation of six nanosatellites is expected to be launched. To fully exploit satellite data and allow faint astronomical events to emerge, a precise estimation of satellite background count rates is required to determine whether the event is statistically valid or not. The dynamics of the background are related to the satellite's orbital information, which varies in the order of minutes, potentially hiding long transient events. This work introduces two main contributions I have brought ahead; first a novel background estimator is presented that could potentially be fitted to any type of X/Gamma-ray satellite space telescope, capable of capturing long-term dynamics and accurate enough to detect faint transients. This estimator is built using a Neural Network and tested on data from the Fermi Gamma-ray Space Telescope's Gamma Burst Monitor (GBM). As a second objective, it is employed a trigger algorithm, called FOCuS (Functional Online CUSUM), to extract events from the background using the background estimator. The resulting framework, DeepGRB, can identify astronomical events that are both present and absent from the Fermi-GBM catalog. The analysis of the discovered events reveals the strengths and weaknesses of the framework.
- Oceania > Australia (0.04)
- North America > United States > Texas > Erath County (0.04)
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
- (17 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Research Report > Experimental Study (0.92)
- Government > Regional Government > North America Government > United States Government (1.00)
- Government > Military (1.00)
- Energy (1.00)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- (2 more...)
INFAMOUS-NeRF: ImproviNg FAce MOdeling Using Semantically-Aligned Hypernetworks with Neural Radiance Fields
Hou, Andrew, Liu, Feng, Ren, Zhiyuan, Sarkis, Michel, Bi, Ning, Tong, Yiying, Liu, Xiaoming
We propose INFAMOUS-NeRF, an implicit morphable face model that introduces hypernetworks to NeRF to improve the representation power in the presence of many training subjects. At the same time, INFAMOUS-NeRF resolves the classic hypernetwork tradeoff of representation power and editability by learning semantically-aligned latent spaces despite the subject-specific models, all without requiring a large pretrained model. INFAMOUS-NeRF further introduces a novel constraint to improve NeRF rendering along the face boundary. Our constraint can leverage photometric surface rendering and multi-view supervision to guide surface color prediction and improve rendering near the surface. Finally, we introduce a novel, loss-guided adaptive sampling method for more effective NeRF training by reducing the sampling redundancy. We show quantitatively and qualitatively that our method achieves higher representation power than prior face modeling methods in both controlled and in-the-wild settings. Code and models will be released upon publication.
- North America > United States > Michigan (0.04)
- Europe > United Kingdom > North Sea > Northern North Sea (0.04)
- Europe > United Kingdom > North Sea > Central North Sea (0.04)
- (2 more...)
Quality-Diversity through AI Feedback
Bradley, Herbie, Dai, Andrew, Teufel, Hannah, Zhang, Jenny, Oostermeijer, Koen, Bellagente, Marco, Clune, Jeff, Stanley, Kenneth, Schott, Grégory, Lehman, Joel
In many text-generation problems, users may prefer not only a single response, but a diverse range of high-quality outputs from which to choose. Quality-diversity (QD) search algorithms aim at such outcomes, by continually improving and diversifying a population of candidates. However, the applicability of QD to qualitative domains, like creative writing, has been limited by the difficulty of algorithmically specifying measures of quality and diversity. Interestingly, recent developments in language models (LMs) have enabled guiding search through AI feedback, wherein LMs are prompted in natural language to evaluate qualitative aspects of text. Leveraging this development, we introduce Quality-Diversity through AI Feedback (QDAIF), wherein an evolutionary algorithm applies LMs to both generate variation and evaluate the quality and diversity of candidate text. When assessed on creative writing domains, QDAIF covers more of a specified search space with high-quality samples than do non-QD controls. Further, human evaluation of QDAIF-generated creative texts validates reasonable agreement between AI and human evaluation. Our results thus highlight the potential of AI feedback to guide open-ended search for creative and original solutions, providing a recipe that seemingly generalizes to many domains and modalities. In this way, QDAIF is a step towards AI systems that can independently search, diversify, evaluate, and improve, which are among the core skills underlying human society's capacity for innovation.
- North America > United States > Wyoming > Natrona County (0.14)
- North America > United States > Texas > Yoakum County (0.14)
- North America > United States > Texas > Gaines County (0.14)
- (6 more...)
- Research Report > New Finding (1.00)
- Research Report > Promising Solution (0.87)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Law (1.00)
- Health & Medicine > Consumer Health (1.00)
- (4 more...)